Speed learning on the fly

نویسندگان

  • Pierre-Yves Massé
  • Yann Ollivier
چکیده

The practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications. The same is true for more advanced variants of stochastic gradients, such as SAGA, SVRG, or AdaGrad. Here we propose to adapt the step size by performing a gradient descent on the step size itself, viewing the whole performance of the learning trajectory as a function of step size. Importantly, this adaptation can be computed online at little cost, without having to iterate backward passes over the full data. Introduction This work aims at improving gradient ascent procedures for use in machine learning contexts, by adapting the step size of the descent as it goes along. Let `0, `1, . . . , `t, . . . be functions to be maximised over some parameter space Θ. At each time t, we wish to compute or approximate the parameter θ∗ t ∈ Θ that maximizes the sum Lt(θ) := ∑ s≤t `s(θ). (1) In the experiments below, as in many applications, `t(θ) writes `(xt, θ) for some data x0, x1, . . . , xt, . . . A common strategy, especially with large data size or dimensionality [Bot10], is the online stochastic gradient ascent (SG) θt+1 = θt + η ∂θ`t(θt) (2) with step size η, where ∂θ`t stands for the Euclidean gradient of `t with respect to θ. Such an approach has become a mainstay of both the optimisation and machine learning communities [Bot10]. Various conditions for convergence exist, starting with the celebrated article of Robbins and Monro [RM51], or later [KC78]. Other types of results are proved in convex settings, Several variants have since been introduced, in part to improve the convergence of the algorithm, which is much slower in stochastic than than in

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Initial Study on the Coordination of Rod and Line Hauling Movements in Distance Fly Casting

Background. The double haul is a unique feature of single-handed fly casting and is used in both fly fishing and fly casting competition. The movement behaviour during the double haul has not been investigated in previous research. Objectives. Describe the coordination of the rod and line hauling movements during distance fly casting. Methods. Elite fly casters performed distance castin...

متن کامل

Providing a Bird Swarm Algorithm based on Classical Conditioning Learning Behavior and Comparing this Algorithm with sinDE, JOA, NPSO and D-PSO-C Based on Using in Nanoscience

There can be no doubt that nanotechnology will play a major role in our futuretechnology. Computer science offers more opportunities for quantum andnanotechnology systems. Soft Computing techniques such as swarm intelligence, canenable systems with desirable emergent properties. Optimization is an important anddecisive activity in structural designing. The inexpensive re...

متن کامل

New Analytic Method for Subgrade Settlement Calculation of the New Cement Fly-ash Grave Pile-slab Structure

At present, reducing subgrade settlement of soft soil foundation is a key problem in high-speed railway construction. Pile-slab structure is a widely-utilized form of foundation structure to reduce the subgrade settlement in China. In order to save the engineering cost for high-speed railway construction in developing countries, the author developed a pile-slab structure and named it as the new...

متن کامل

Soft Foundation Strengthening Effect and Structural Optimization of a New Cement Fly-ash and Gravel Pile-slab Structure

Reducing the settlements of soft foundation effectively is a critical problem of high-speed railway construction in China. The new CFG pile-slab structure composite foundation is a ground treatment technique which is applied on CFG pile foundation and pile-slab structure composite foundation. Based on the experience of constructing Beijing-Shanghai high-speed railway in China, the settlement-co...

متن کامل

From Traditional Neural Networks to Deep Learning: Towards Mathematical Foundations of Empirical Successes

How do we make computers think? To make machines that fly, it is reasonable to look at the creatures that know how to fly: the birds. To make computers think, it is reasonable to analyze how we think – this is the main origin of neural networks. At first, one of the main motivations was speed – since even with slow biological neurons, we often process information fast. The need for speed motiva...

متن کامل

On the convergence speed of artificial neural networks in‎ ‎the solving of linear ‎systems

‎Artificial neural networks have the advantages such as learning, ‎adaptation‎, ‎fault-tolerance‎, ‎parallelism and generalization‎. ‎This ‎paper is a scrutiny on the application of diverse learning methods‎ ‎in speed of convergence in neural networks‎. ‎For this aim‎, ‎first we ‎introduce a perceptron method based on artificial neural networks‎ ‎which has been applied for solving a non-singula...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1511.02540  شماره 

صفحات  -

تاریخ انتشار 2015